Introduction

This is some exploratory analysis looking at both the number of dimensions in our statistical performance indicators and whether our quintile based approach for grouping countries is sensible.

We apply two methods. First, we use principal components analysis (PCA) to examine how many relevant dimensions are present across our 54 SPI indicators. Second, we compare our SPI quintile groups to groups formed using K means clustering.

Takeaways:

Principal Components Analysis

Start by assessing dimensionality of SPI indicator data. We will use principal components analysis applied to our 54 SPI indicators using data from 2019. We begin by showing a “Scree” plot to show the number of principal components to keep in our principal components analysis. There appears to be a single dominant principal component, with a second dimension that is less important but makes up 10% of the variation.

 

 

Next we look at how well the single PCA dimension maps to our SPI Overall Score. The fit is quite strong. The R squared in a regression of the SPI overall score on the 1st principal component is around 0.96.

This is some evidence that our SPI overall score is picking up the bulk of the variation between countries, since it is highly correlated with a dominant 1st principal component.

 

 

K Means clustering

Next, we can assess whether our SPI quintile groups serve as sensible groups between countries. In order to do this, we compare our SPI groups to those formed using K means clustering. The K means clustering algorithm finds the groups of observations (in our case countries) that minimize the within group variance in our 54 indicators.

In the analysis below, we explore both the optimal number of clusters and whether our SPI groupings based on the quintile of the SPI overall score match these K means clusters.

We examine different numbers of clusters below and visually inspect which seems to provide sensible clusters. We choose k=8,5,4, and 3.
 

 

Clustering using 5 clusters

 

 

Clustering using 4 clusters

 

 

Clustering using 3 clusters

 

 

Compare k means clustering to SPI Overall Score Quintiles

Finally, we visually show how our SPI quintile groups might compare with some of the results from K means clustering. General, our SPI quintile groups line up closely with the 1st PCA dimension, whereas the K means clustering also pick up differences with the 2nd PCA dimension causing some differences.

We can visually observe how well our SPI quintile groupings fit with the first two PCA dimensions.

Countries shaded in dark red are the lowest performing, countries in dark green are the highest performing. Countries are grouped into five groups: